Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
نویسندگان
چکیده
As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting conditional branches, to exploit more instruction-level parallelism by reducing the impact of memory delays. We examine the effects of the execution of loads down the wrong branch path on the performance of an aggressive issue processor. We find that, by continuing to execute the loads issued in the mispredicted path, even after the branch is resolved, we can actually reduce the cache misses observed on the correctly executed path. This wrong-path execution of loads can result in a speedup of up to 5% due to an indirect prefetching effect that brings data or instruction blocks into the cache for instructions subsequently issued on the correctly predicted path. However, it also can increase the amount of memory traffic and can pollute the cache. We propose the Wrong Path Cache (WPC) to eliminate the cache pollution caused by the execution of loads down mispredicted branch paths. For the configurations tested, fetching the results of wrong path loads into a fully associative 8-entry WPC can result in a 12% to 39% reduction in L1 data cache misses and in a speedup of up to 37%, with an average speedup of 9%, over the baseline processor.
منابع مشابه
The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture
Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit ...
متن کاملUsing Incorrect Speculation to Prefetch Data in a Concurrent Multithreaded Processor
Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. The resulting speculative issuing of load instructions in these architectures can significantly impact the performance of the memory hierarchy as the system exploits higher degrees of parallelism. In this study, we in...
متن کاملMispredicted Path Cache Effects
As superscalar pipelines become wider and deeper, the percentage of dynamic instructions fetched into the machine from the mispredicted path significantly increases. This paper discusses how a new cycle-accurate performance simulator is used to accurately measure mispredicted path effects on the cache hierarchy. Previously published results based on less accurate tools indicated that mispredict...
متن کاملA Study of Mispredicted Branches Dependent on Load Misses in Continual Flow Pipelines
Large instruction window processors can achieve high performance by supplying more instructions during long latency load misses, thus effectively hiding these latencies. Continual Flow Pipeline (CFP) architectures provide high-performance by effectively increasing the number of actively executing instructions without increasing the size of the cycle-critical structures. A CFP consists of a Slic...
متن کاملAVID: A Complexity Effective Front-End Architecture for Fast Memory Disambiguation and Early Data Prefetching
False dependency between load and store prevents independent load from executing. Without speculation, ambiguous load should wait until there is no address confliction of preceding store instruction, which cause unnecessary latency. On the other hand, the load can be executed speculatively, which costs the penalty when it turns out to be miss-predicted by back-end. We propose a complexity effec...
متن کامل